Variance reduction in feature hashing using MLE and control variate method

نویسندگان

چکیده

The feature hashing algorithm introduced by Weinberger et al. (2009) is a popular dimensionality reduction that compresses high dimensional data points into low closely approximate the pairwise inner product. This has been used in many fundamental machine learning applications such as model compression (Chen 2015), spam classification (Weinberger 2009), compressing text classifiers (Joulin 2016), large scale image (Mensink 2012). However, limitation of this approach variance its estimator for product tends to be small values reduced dimensions, making estimate less reliable. We address challenge and suggest two simple practical solutions work. Our relies on control variate (CV) maximum likelihood (MLE), which are techniques statistics. show these methods lead significant similarity estimation. give theoretical bounds same complement it via extensive experiments synthetic real-world datasets. Given simplicity effectiveness our approach, we hope can adapted practice.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Variance analysis of control variate technique and applications in Asian option ‎pricing‎

This paper presents an analytical view of variance reduction by control variate technique for pricing arithmetic Asian options as a financial derivatives. In this paper, the effect of correlation between two random variables is shown. We propose an efficient method for choose suitable control in pricing arithmetic Asian options based on the control variates (CV). The numerical experiment shows ...

متن کامل

variance analysis of control variate technique and applications in asian option ‎pricing‎

this paper presents an analytical view of variance reduction by control variate technique for pricing arithmetic asian options as a financial derivatives. in this paper, the effect of correlation between two random variables is shown. we propose an efficient method for choose suitable control in pricing arithmetic asian options based on the control variates (cv). the numerical experiment shows ...

متن کامل

Control variate method for stationary processes

The sample mean is one of the most natural estimators of the population mean based on independent identically distributed sample. However, if some control variate is available, it is known that the control variate method reduces the variance of the sample mean. The control variate method often assumes that the variable of intersest and the control variable are i.i.d. Here we assume that these v...

متن کامل

Control-Variate Estimation Using Estimated Control Means

We study control variate estimation where the control mean itself is estimated. Control variate estimation in simulation experiments can significantly increase sampling efficiency, and has traditionally been restricted to cases where the control has a known mean. In a previous paper (Schmeiser, Taaffe, and Wang 2000), we generalized the idea of control variate estimation to the case where the c...

متن کامل

Developing a Filter-Wrapper Feature Selection Method and its Application in Dimension Reduction of Gen Expression

Nowadays, increasing the volume of data and the number of attributes in the dataset has reduced the accuracy of the learning algorithm and the computational complexity. A dimensionality reduction method is a feature selection method, which is done through filtering and wrapping. The wrapper methods are more accurate than filter ones but perform faster and have a less computational burden. With ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Machine Learning

سال: 2022

ISSN: ['0885-6125', '1573-0565']

DOI: https://doi.org/10.1007/s10994-022-06166-z